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Objectives: Rare disease research requires a broad range of disease-related information for the discovery of causes of ge- 
netic disorders that are maladies caused by abnormalities in genes or chromosomes. A rarity in cases makes it difficult for 
researchers to elucidate definite inception. This knowledge base will be a major resource not only for clinicians, but also for 
the general public, who are unable to find consistent information on rare diseases in a single location. Methods: We design a 
compact database schema for faster querying; its structure is optimized to store heterogeneous data sources. Then, clinicians 
at Seoul National University Hospital (SNUH) review and revise those resources. Additionally, we integrated other sources 
to capture genomic resources and clinical trials in detail on the Korean Rare Disease Knowledge base (KRDK). Results: As a 
result, we have developed a Web-based knowledge base, KRDK, suitable for study of Mendelian diseases that commonly oc- 
cur among Koreans. This knowledge base is comprised of disease summary and review, causal gene list, laboratory and clinic 
directory, patient registry, and so on. Furthermore, database for analyzing and giving access to human biological information 
and the clinical trial management system are integrated on KRDK. Conclusions: We expect that KRDK, the first rare disease 
knowledge base in Korea, may contribute to collaborative research and be a reliable reference for application to clinical trials. 
Additionally, this knowledge base is ready for querying of drug information so that visitors can search a list of rare diseases 
that is relative to specific drugs. Visitors can have access to KRDK via http://www.snubi.org/software/raredisease/. 
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I. Introduction 

Certain definite phenotypes of rare diseases were actively 
researched among scientists to clarify its root even though a 
rare disease occurs infrequently in the total human popula- 
tion [1]. Accordingly, a few markers have been discovered 
and researchers have identified genetic origins [2]. Espe- 
cially, the development of large-scale initiative in sequencing 
technologies has powerfully determined more rare variants 
in Mendelian disorders [3,4]. Many scientists are attracted to 
find uncovered elements in this field nowadays because they 
believe that it is the first step of approaching to cure rare 
chronic diseases. 
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Rare diseases are actually common despite of its rarity [5] . 
In other words, there are a large number of rare disease pa- 
tients. According to EURORDIS, it is estimated that more 
than 5,000 distinct rare diseases exist and more than 6% of 
European are affected by rare diseases. In addition, there 
might be a number of unreported cases, ultra-rare diseases, 
and few patients that do not have obvious characteristic or 
symptom [6], In some populations, prevalence of rare dis- 
eases is dramatically high due to genetic inheritance. Hence, 
academics try to take a look into the ethnic characteristics in 
order to identify a clue of genetic disease which mainly oc- 
curs between populations [7]. 

Nevertheless, Korean researchers refer foreign resources to 
get information for rare diseases. Applying new guideline 
of treatments from different ethnic group would be inap- 
propriate for Korean patients. A great number of health care 
providers and consumers are seeking for valuable informa- 
tion [8], however, some materials online does not help for 
domestic researchers such as: laboratory and clinic directory. 
It is necessary to have data interchange hub for national so 
that researchers would have meaningful data for rare disease 
that mainly occurs among Korean. 

According to Centers for Disease Control (CDC), approxi- 
mately 500,000 of people are suffering from more than 1 10 



Development of Korean Rare Disease Knowledge Base 

kinds of rare diseases in Korea. Strictly speaking, approxi- 
mately 0.1% of Korean population has been attacked by a 
few diseases and most of them have not started any research 
yet. Moreover, it is hard to find resources-genetic counsel- 
ing, disease treatment, care center information-for sufferers 
and their family. The only way to get information is consult- 
ing a doctor. 

Therefore, we have developed Korean rare disease knowl- 
edge base. Its aim is to contribute to the collaboration work 
between practitioners who are interested in the same sub- 
jects for better results. Rare disease knowledge base can 
provide overall tips to subjects for better understanding 
and treatment. Additionally, the first step of this approach 
will make people to begin paying attention to the problems 
of carelessness with orphan disease. The knowledge base is 
comprised of disease summary, review article, genetic varia- 
tion, laboratory and clinic directory, patient registry, and do- 
mestic research. Although drug database is not yet ready in 
Korean Rare Disease Knowledge base (KRDK), it is already 
considered for storing list of drug that associated with genes 
when we design database schema. For the last step, we also 
developed a Web-base interface with user friendly to enable 
to search and find knowledge instantly with a few clicks. 
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Figure 1. An optimized database schema for Korean Rare Disease Knowledge base (KRDK). Well-structured database keeps data integ- 
rity, avoiding data redundancy problem, and appropriately organized data structure helps searching effectively. 
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II. Methods 

1. Database Schema 

GeneTests is a database developed at University of Washing- 
ton for free online medical genetics information resources 
(http://www.genetests.org) [9]. We referred its schema for 
designing a new one and made an optimized database sche- 
ma for storing information about summary of disease, re- 
view, genetic variant, laboratory and clinic directory, patient 
registry, and domestic research on KRDK. It helps to execute 
querying quickly and avoid redundant data. Additionally, 
we considered storing drug information that is highly con- 
nected to target gene and disease. As a result, extended ver- 
sion of database schema is ready for disease, gene, and drug 
relations and its relation based query is executable (Figure 1). 

2. Data Resource 

Orphanet, which runs by a consortium of European part- 
ners, is a database of information on rare diseases and or- 
phan drugs for all public (http://www.orpha.net) [10]. It pro- 
vides overall information about disease and its genetic data. 
8 practitioners in Seoul National University Hospital (SNUH) 
generally referred and revised Orphanet review articles and 
other public databases, then they arrange a set of structured- 
revision data and we stored over 500 disease summaries and 
48 reviews which are translated in Korean. We collected na- 
tional laboratory and clinic directory by contacting experts 
and directors. Once data supplied voluntarily, Research Cen- 
ter for Rare Diseases (RCRD, http://rarediseasecenter.org) 
confirmed and guaranteed those data contain laboratory and 
clinic name, director's name, contacts and detail informa- 
tion. Specifically, lab directors provide molecular genetic 
testing for a specific disorder. Finally, we have attempted to 
make the listing of Korean clinical laboratory comprehensive 
and stored reliable information in order to offer directory 
lists by searching specific disease. 

Bio Electronic Medical Record (BioEMR) is clinical trials 
management system (http://bioemr.snubi.org:8080/bio- 
emr/rdrc/) that was developed at Seoul National University 
Biomedical Informatics (SNUBI) [11]. Patient's records in 
BioEMR, however, are not open access data due to private 
information. Accordingly, we summed up and offer only 
summary of patient's registry of each disease. It is also avail- 
able to have access registry data in detail by contacting 
RCRD. Genome Research Information Pipeline (GRIP) is an 
integrated database for analyzing and having access biologi- 
cal information of human, mouse, and rat (http://grip.snubi. 
org). GRIP consists of a number of major biological databas- 
es and contains information about sequence, gene, protein, 
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gene family, protein family, and enzyme. We integrated the 
KRDK with GRIP to capture the all biological information 
related the gene name or symbol. 

III. Results 

1. Data Collection 

KRDK contains 520 rare disease entries and 48 disease re- 
views. In addition, 6 laboratory directories include testing 
methods on 303 inherited diseases for 184 genes and 35 spe- 
cialists in rare diseases in Korea are listed as well. There are 
839 discovered genes that are known to affect rare diseases 
and its chromosomal location and protein name are also 
provided. It contains 12 disease patients' registries and 22 
ongoing domestic researches with study title, director con- 
tact, and so on (Table 1). 

2. Search Input 

User may search specific disease with its name-it does not 
need to be a full name of disease-in either Korean or Eng- 
lish. KRDK can be searchable by disease name, gene symbol, 
protein name in order to explore a certain rare disease and 
disease name can be used to find related laboratory and 
clinic directory. On the other hand, users may have a look of 
whole registered rare diseases by clicking All Disease' button 
at the bottom. 

3. Search Results 

Every search result is displayed alphabetical order for all 
diseases. KRDK is comprised of well defined categories 
with highly structured sections (Figure 2). For every single 
disease, a search result shows 5 buttons (summary, testing, 
clinic, registry, and research) and two different colors (blue 
signifies activated button and gray signifies inactivated but- 
ton) of button represents availability of its content. 



Table 1 . The number of entries of each category 



Category 


No. of entries 


Disease summary 


520 


Review articles 


48 


Affected genes 


839 


Laboratory directory 


6 


Clinic directory 


26 


Practitioner 


35 


Patient's registry summary 


12 


Domestic research 


22 
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Rare Disease 

Hereditary sensory and autonomic neuropathy, type 4 



Genetic Variant(s) 



Gene Symbol Chromosome Location Protein Name 

NTRK1 lq23.1 High affinity nerve growth factor receptor 



Prevalence 

Unknown 



Inheritance 

Autosomal recessive 



Age of Onset 

Neonatal/infancy 



ICD 10 Code 

G60.8 



GRIP 



MIM Number 

J-.i>:J:-. i . 



OM1M 



Synonym(s) 



HSAN4 

Insensitivity to pain - anhidrosis 



Summary 

Hereditary sensory and autonomic neuropathy, type 4 (HSAN4) is an inherited disorder characterized b 
anhidrosis, insensitivity to pain, self-mutilating behavior and episodes of fever. Several hundred cases 
been published. The disease has an onset in early infancy. Consanguinity has been reported in 50% of 
patients. Episodic fevers, extreme hyperpyrexia and recurrent febrile convulsions (due to anhidrosis) a 
as self-mutilation are usually the earliest signs of the disorder. The cardinal feature of HSAN4 is absen 
or markedly decreased sweating, It is present on the trunk and upper extremities in 100% of cases, oth 
areas of the body are variably affected. The skin becomes thick and callused with lichenification of pair 
areas of hypotrichosis on the scalp and dystrophic nails. Pain and temperature perception are absent, 
time, the sensory insensitivity is much more profound resulting in self-mutilation, auto-amputation, anc 
corneal scarring. Patients have definite problems in healing of ectodermal structures, fractures are slo* 
heal and large weight bearing joints appear particularly susceptible to repeated trauma and frequently g 
the development of Charcot joints and osteomyelitis. Hypotonia and delayed developmental milestones 
frequent in the early years, but normalize with age, Postural hypotension with compensatory tachycardi 
be present but not episodic hypertension. Less than 10% of patients have depressed deep tendon re: 
Vibration sense is normal or moderately decreased. Scoliosis may be present (20%), Irritability, hyperac 
and susceDtibilitv to raaes are freauent. Soeech is usually clear, however, there can be severe leamina 



MIM ID #256800 

INSENSITIVITY TO PAIN. CONGENITAL. WITH ANHIDROSIS: CIPA 

NEUROPATHY, CONGENITAL SENSORY, WITH ANHIDROSIS 
HEREDITARY SENSORY AND AUTONOMIC NEUROPATHY IV: HSAN4 
HSAN IV 

FAMILIAL DYSAUTONOMY. TYPE II 




ity Co pari with anhidrosis (CIPA) is caused by 



Swanson at «- (1963, 1965 ) described 2 brothers with congenial nsensitrvitv to pan Arid anhidrosis, despite normal- 
appeanng sweat gland* on skin biopsy. Temperature sensation was also defective. One of the brothers died alter a 24-hour 
*iess dunng which hrs temperature reached 109 degrees F. Almost complete absence of the test order afferent system 
snable for pain and temperature was found at autopsy (Swansc.n al nr.t) p,n=ky ai.J tlh^c^'- 



Figure 2. The search result page of disease and additional information. Summary section shows overall disease information briefly. 

With clicking gene symbol, it directly connects to Genome Research Information Pipeline (GRIP) for more information that 
relates to a specific gene such as: protein, protein family, pathway and so on. Moreover, Mendelian Inheritance in Man (MIM) 
number have hyperlink to online MIM (OMIM) so that user may browse on Website instantly. 



1) Rare disease summary and review 

Summary section provides rare disease name, prevalence, 
inheritance, age of onset, the International Statistical Clas- 
sification of Diseases (ICD) 10 codes, Mendelian Inheritance 
in Man (MIM) number, disease synonym, and a summarized 
article of disease in Korean. Disease related genetic informa- 
tion are shown on the same page and provide gene symbol, 
chromosomal location and protein name. Additionally, every 
MIM number has hyperlink to online MIM (OMIM) Web- 
site and gene symbol links to GRIP for more information 
related to a specific gene. Each of review includes definition, 
prevalence, mechanism, diagnosis, treatment, genetic coun- 
seling and prenatal diagnosis. Those reviews were referred to 
previous researches and written by consultant of each spe- 
cific disease. 



2) Laboratory and clinic directory 

Above all, laboratory and clinic directory are domestic in- 
formation. Laboratory directory focuses on testing usage 
in diagnosis and target genes for a specific disease. Also, its 
contact information has informed (Figure 3A). On the other 
hand, clinic directory provides not only a clinic name but 
also a list of name of a practitioner who is a specialist of a 
specific rare disease and contacts (Figure 3B). 

3) Patient's EMR 

BioEMR is a patient registry developed at SNUBI and it 
is not accessible database for public because of patient in- 
formation. Consequently, we provide information about 
the name of registry, the number of registered patients, the 
number of registered genes, a name of director, contact and 
BioEMR URL so that scientist who wants to get patient re- 
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Disease name 

Autosomal dominant optic atrophy, classic type 



[ aboratory Name / Method Name 


Seqtiencirm 


1 )i 'lh 1 ifju/l )u|ilii:.il ion ,in,ilyi;h; 


Clii!ini(i!;(irm" Study 


USH 


ETC 


Samsung Medical Center 


0PA1 










. i N..ti. 1 i,iv-->::i1 1 !- . .pil :l 

! i'.,4 MI) PIJ1 


0PA1 


Ol 'Al 









B 

Disease name 

Alport syndrome 

Director 

Hae-I Cheong, MO. PhD 

Clinic 

Seoul National University Hospital. Chldrens Hosprtal 

Department 

Division of Nephrology 

Address 

101 DAEHAK-RO JONGNO-GU. SEOUL 1 10-744. KOREA 

Contact 

02-2072-3584 , 3565 

Homepage 

htip //peovxlney co 1 r/man/main php 

Figure 3. An example of laboratory and clinic directory pages. 

Search result for laboratory and clinic directory pages. 
(A) All laboratories are list with available testing 
method with target gene. Additionally, by clicking 
laboratory name, visitors may get its information in 
detail such location, contact. (B) Clinic directory shows 
a name of practitioner that is a specialist of a specific 
disease and other information like clinic name and 
contact. 



Disease name 

Alport syndrome 

Registry name 

Korean alport syndrome patients registry 

Initial / Last update 

1998-01-06/ 2010-12-30 

The number of registered patients / gene 

144/75 

Director 

Hae-il Cheong, MD, PhD 

Email 

cheongni@snuac.kr 

Contact 

02-2072-3584,3585 

Patient's registry URL 

http.//bioemr. snubi.org 8080/bioemr/rdrc 

References 

Mutational analysis of COL4A5 gene In Korean Alport syndrome. Cheong HI. Park HW, Ha IS. Choi Y. Pedlatr Nephrol 
2000Feb;l4(2):117-21. PMD: 10684360 

Pattern of double glomerulooathy In children Cheong HI. Cho HY, Moon KC. Ha IS, Choi V. Pediatr Nephrol 2007 
Apr;22(4):52t-7. PMID: 17109138 

Immunohistologic studies of type IV collagen in anterior lens capsules of patients with Alport syndrome. Cheong HI. 
Kashtan CE. Km Y, Kleppel MM, Mtehael AF Lab Invest 1994 Apr:70(4):553-7. 

Immunochemical studies of the Alport antigen. Kleppel MM, Fan WW, Cheong HI, Kashtan CE. Michael AF Kidney Int. 
1992Jun;4K6):1629-37. 

Evidence for separate networks of classical and novel basement membrane collagen Characterization of alpha 
3CIV)-alpon antigen heterodrmer Kleppel MM, Fan w, Cheong HI, Michael AF. J Biol Chem. 1992 Feb 
25l267(6):4137-42. 

Figure 4. The summary of patient's registry and its original Web 
page link. Patient's registry shows a summary of re- 
cords due to privacy policy. Summary is comprised of 
the number of registered patient and gene, director, 
contact, references and so on. Additionally, patient's 
registry URL directly links and visitors can fully have 
access to permission from Research Center for Rare 
Diseases (RCRD). 



ports in detail can request data for referring each record of 
patients (Figure 4). 

4) Rare disease domestic research 

Lastly, we investigated and stored summary of rare disease 
national studies. This section includes study title, institute, 
subject related references, director's name and contact. A cli- 
nician who is interested in a specific rare disorder will have a 
chance to cooperate for the better research outcome. On the 
other hand, it is good to avoid two similar experimentations 
in different groups. With the many domestic researches, 
more and more people will pay attention to concern genetic 
disorders. 

4. A Web-Based Online Submission Tool 

A unified format makes it easy for data handler to collect 
data without information loss and to keep data consistency. 
However, each of institute has its own format for recording. 
Each of outlines needs to be modified depending on pur- 



pose of usage. Hence, we have been offering a Web-based 
online submission tool with a single design for updating and 
modifying investigated information and test results without 
cost. Only experts may submit knowledge so there is a re- 
striction-it requires sign in process-on having access to sub- 
mission tool because untested information would mislead 
beginner practitioners and it is all about a clinical matter of 
life and death. 

IV. Discussion 

This paper introduces the first rare disease knowledge base 
for Korean. It is comprised of comprehensive rare disease, 
genetic variants, disease review, laboratory directory for 
molecular genetic testing, clinic directory for diagnosis and 
care, patient's registry, and ongoing rare disease study. Such 
resources are spread out online, therefore, we developed rare 
disease knowledge base integrated with patient's registry da- 
tabase and database for having access biological information 
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Rare disease 
knowledge base 



Relationship 









Drug database 


Genetic 
information 
database 


A ► 



Figure 5. Drug information that is highly connected to rare dis- 
ease and genetic information database. We are ready 
for storing drug information on Korean Rare Disease 
Knowledge base (KRDK). Drugs are highly connected to 
target genes and specific diseases. With this relation- 
ship, physician can perform several possible queries; 
What are drugs relative to Wilson disease? What tar- 
get genes do Penicillamine have? What are genes that 
affect Wilson disease? By answering those queries, 
clinician can analyze research efficiently with this in- 
tegrative database scheme. 



of human. Yet there is not enough data for those who are in- 
terested in rare disease, we expect that the number of input 
data will exponentially increase sooner or later. 

KRDK integrate other sources like GRIP and BioEMR in 
order to provide comprehensive disease information. Conse- 
quently, physicians do not need to spend more time on gath- 
ering patient's record and referring other biological databases 
for more information. By clicking gene symbol, visitors will 
instantly get information about gene family, protein, protein 
family, enzyme, and so on. It is also able to store drug infor- 
mation and each drug has relationships between drug, target 
gene and disease. Performing query with drug names shows 
disease and target gene list, and vice versa (Figure 5). Though 
we focused on gathering rare disease sources on KRDK, 
we have primarily concerned about data reliability. Experts 
double checked all disease summary and reviews in order to 
provide accurate knowledge. Hence, not only physicians but 
also patients who may not know much in rare diseases will 
get useful and practical information from this knowledge 
base. 

Online submission tool helps to collect more information 
easily from various resources, therefore, submitted materi- 
als need to be confirmed by experts. It is necessary to make 
a group of experts on rare disease for verifying given data to 
offer reliable information. With amount of credible data, we 
expect that this knowledge base contribute to help collabora- 
tive research and apply for clinical trials. An effort by creat- 
ing Korean knowledge base for improving understanding on 



rare diseases among Korean is the most valuable infrastruc- 
ture in research field. 

More and more variants were identified by next generation 
sequencing technology [12,13]. There is great interest in 
investigating whole-exome sequencing data for deciphering 
rare variants which can be a key role in the etiology of rare 
disease. The price is affordable for exon and so is whole-ge- 
nome sequencing recently. Affecting variants to rare disease 
can be distinguished by trio or quartet sequencing among 
family. We assume that discovered variants also can be 
alarmed in the same race and there would be more and more 
identified causal variants from sequencing data analysis. 
Consequently, we are considering that building a database 
for genetic variants with exome and whole-genome sequenc- 
ing data so that practitioners or patients can refer their own 
sequenced data on KRDK. 
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